Lexical token alignment: experiments, results and applications

نویسندگان

  • Dan Tufis
  • Ana-Maria Barbu
چکیده

Lexical alignment is one of the most challenging tasks in processing and exploiting parallel texts. There are numerous applications that may benefit from an accurate multilingual lexical alignment of biand multi-language corpora. We describe in this paper a hypothesistesting approach to the problem of automatic extraction of translation equivalents from sentence-aligned and tagged parallel corpora. The algorithm was used for automatic extraction of 6 bi-lingual lexicons with English as source language and Bulgarian, Czech, Estonian, Hungarian, Romanian and Slovene as the target one, as well as a 7-language lexicon with English as a hub and the other 6 CEE languages. For the experiments described here we used the 7-language aligned corpus based on Orwell’s “1984” novel.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spontaneous lexical alignment in children with an autistic spectrum disorder and their typically developing peers.

It is well established that adults converge on common referring expressions in dialogue, and that such lexical alignment is important for successful and rewarding communication. The authors show that children with an autistic spectrum disorder (ASD) and chronological- and verbal-age-matched typically developing (TD) children also show spontaneous lexical alignment. In a card game, both groups t...

متن کامل

Effect of lexical status on phonetic categorization.

To investigate the interaction in speech perception between lexical knowledge (in particular, whether a stimulus token makes a word or nonword) and phonetic categorization, sets of [bVC]-[dVC] place-of-articulation continua were constructed so that the endpoint tokens represented word-word, word-nonword, nonword-word, and nonword-nonword combinations. Experiment 1 demonstrated that ambiguous to...

متن کامل

Evaluation of Methods for Sentence and Lexical Alignment of Brazilian Portuguese and English Parallel Texts

Parallel texts, i.e., texts in one language and their translations to other languages, are very useful nowadays for many applications such as machine translation and multilingual information retrieval. If these texts are aligned in a sentence or lexical level their relevance increases considerably. In this paper we describe some experiments that have being carried out with Brazilian Portuguese ...

متن کامل

Nonnative Processing of Verbal Morphology: In Search of Regularity

There is little agreement on the mechanisms involved in second language (L2) processing of regular and irregular inflectional morphology and on the exact role of age, amount, and type of exposure to L2 resulting in differences in L2 input and use. The article contributes to the ongoing debates by reporting the results of two experiments on Russian verb generation and recognition in a lexical de...

متن کامل

Processing of Lexical Bundles by Persian Speaking Learners of English

Formulaic sequence (FS) is a general term often used to refer to various types of recurrent clusters. One particular type of FSs common in different registers is lexical bundles (LBs). This study investigated whether LBs are stored and processed as a whole in the mind of language users and whether their functional discourse type has any effect on their processing. To serve these objectives, thr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002